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ABSTRACT 

Concern over the validity of statistical tests 
performed on data that may not satisfy underlying assiunptions has 
prompted methodological researchers to perform Monte Carlo studies 
for frequently used tests. Unfortunately, these studies appear to 
have had little impact on methodological practice. One reason is the 
lack of an overarching framework to guic!^* the interpretation of Monte 
Carlo studies for the same test. Another is the impressionistic 
nature of these studies, which can lead different readers to 
different conclusions. These shortcomings can be addressed using 
quantitative methods of research synthesis (e.g., meta-analysis) to 
summarize the results of Monte Carlo studies for a statistical test. 
In this paper, tiiese methods are applied to a sample of Monte Carlo 
studies of the F-test in the oneway fixed-effects analysis of 
variance (ANOVA) model. The present analyses were based on Monte 
Carlo studies reported in 21 out of 30 journal articles. The results 
provide empirical support for the robustness of the Type I error rate 
of the F-test to certain assumption violations. However, the Type I 
error rate of the F-test was noticeably affected by unequal 
variances, even when sample sizes were equal. . ^commendations for 
using this test when certain assumptions are violated are made. Four 
data tables and one bar graph are included. (Author/TJH) 
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Abstract 



Concern over the validity of statistical tests performed on data 
that may not satisfy underlying assumptions has prompted methodological 
researchers to perform Monte Carlo studies for frequently used tests. 
Unfortionately^ these studies appear to have had little impact on 
methodological practice. One reason is the lack of an overarching 
framework to guide the interpretation of Monte Carlo studies for the same 
test. Another is the impressionistic nature of these studies, which can 
lead different readers to different conclusions. These shoirtcomings can 
be addressed using quantitative methods of research synthesis (e.g., 
meta-analysis) to summarize the results of Monte Carlo studies for a 
statistical test, in this paper, these methods are applied to a sample 
of Monte Carlo stiidies of the F-test in the oneway fixed-effects ANOVA 
model. The results provide empirical support for the robustness of the 
type I error rate of the F-test to certain assumption violations. 
However, the type I error rate of the F-test was noticeably affected by 
unequal variances, even when sample sizes were equal. Recommendations 
for using this test when certain as' options are violated are made. 
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Summarizing Honte Cailo Results in Methodological Research: 
The Oneway Fixed-Effects i^NOVA Case 

Introduction 

An ongoing concern of quantitative methodologists is the validity of 
statistical tests performed on data that may not satisfy underlying 
assumptions (e.g., normality of a population score distribu*" .x) . These 
concerns have been heightened by recent work suggesting that the bulk of 
educational and psychological data are at least moderately and sometimes 
strikingly nonnormal (Micceri, 1989). Micceri's work is evidence of the 
usefulness of statistical tests which are insensitive to assump-Kion 
violations^ i»e. , whose type I error properties are not deleteriously 
affected. Tests which are insensitive to assumption violations are 
considered to be robust; tests which are not robust are less useful. 

A large ntimber of MC studies of particular statistical tests are 
available in the methodologiciil research literature. Unfortunately, 
these results lack an overarching framework to guide their 
interpretation. In addition, the impressionistic na^/are of MC results 
makes it possible for different readers to reach different conclusions. 
These shortcomings can be addressed by using quantitative methods of 
research synthesis (e.g., meta-analysis) (Harwell, 1990): the goal is to 
quantitatively summarize the results of MC studies for a statistical test 
in a way that generates guidelines for using that test under specific 
assumption violations. This would also permit the results of previous 
statistical analyses using that test to be evaluated. 

The purpose of this paper is to apply the meta-analytic framework 
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illustrated in Harwell (1990) to summarize the results of a sample of MC 
studies of the F-*test in the oneway fixed-effects ANOVA model. The paper 
is organized following the framework illustrated in Cooper (1982) and 
used in Harwell (1990) . Only the type I error case is examined in this 
paper. 

Firsts previous attempts to s'ummarize MC studies of the F-test in 
the oneway fixed-effects ANOVA ^odel are briefly reviewed. The need to 
complement qualitative summaries of MC studies with quantitative methods 
is emphasized. Next^ data collection procedures and issues are 
discussed. Then^ data evaliiation procedures which are used to ensure 
accurate coding and data entry are discussed. Finally^ the MC data is 
analyzed and the results interpreted. These results inform 
methodological practice by generating guidelines for using the F-fcest in 
the oneway fixed-effects ANOVA model under specific assumption 
violations . 



Problem Formulation 

A number of MC studies of the F-test are available in the 
methodological research literature. However^ previous attempts to 
siammarize these results have been narrative in nature and have lacked an 
overarching framework to guide their interpretation. These shortcomings 
can be addressed using methods conceptualized by Glass (1976)^ who 
suggested using standardized mean differences (i.e.^ effect magnitudes) 
as a way of summarizing study results • In the present context, the 
empirical proportions of rejections (i.e., empirical type I errors) serve 
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as effect magnitudes (EMs) (Harwell, 1990) • The goal of these methods is 
to produce an empirical network of MC results which will generate 
guidelines for the use of the F-test under specific assumption 
violations. 

The P-test was selected for two reasons. First, comparing the meta- 
analytic results to known theoretical and empirical results for this 
popular test permits the usefulness of the meta-analytic methods to be 
evaluated. Second, these methods will be used to investigate the effect 
of heterogeneous variances on the F-test when sample sizes are equal. 
Recent MC evidence (e.g., Tomarken & Serlin, 1987) cast doubt on the oft- 
cited conclusion of Glass, Peckham, and Sanders (1972) that, in the 
presence of equal samples, there is a "very slight effect on a [the 
nominal type I error rate] , which is seldom disturbed by more than a few 
hundredths" . 

Data Collection 

Selection of Studies 

A population of MC studies of tha F-test in the oneway fixed-effects 
ANOVA model was identified by searching the ERIC data base. Dissertation 
Abstracts International, and the Current Index to Statistics. Key words 
used to locate relevant studies follov^: ANOVA, distribution- free, 
Kruskal-Wallis, Monte Carlo, nonnormality, nonparametric, power, ranks, 
robustness, simulation, t-tests. Type I error rate, WiXcoxon, and Welch. 
The literature search yielded approximately thirty journal articles and 
four dissertations that appeared to be accessible. 

Searching large data bases does not ensure that all relevant studies 
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will be identified. For example^ MC results reported in unpublished 
technical reports and master •s theses are likely to be underepresented or 
missed completely. Under these conditions, the MC studies included in 
the meta-analysis may differ in some important way from those not 
included. The nature of MC studies, however, makes it probable that the 
potentially nonrandom sample of MC studies of the F-*test is 
representative of the specified population. 

The small number of accessible studies yielded by the literature 
search led to the decision to use every available study in the meta- 
analysis. Note that one or more study selection biases may be introduced 
if the identified population of studies are not representative of the 
entire population of MC studies of the F-tast in the oneway fixed- 
effects ANOVA model (Harwall, 1990). 

The present analyses were based upon MC studies reported in twenty- 
one of the thirty journal articles. These articles are listed in 
appendix A. The data reported in the remaining articles and the 
dissertations are not yet available for statistical analysis. Hence the 
conclusions in this paper are preliminary and could change with the 
inclusion of the remaining MC studies. 

Next, the twenty-one MC studies were screened for serious 
methodological flaws. The fact that all twenty-one studies were 
piiblished in refereed journals provides some protection. In addition, 
each study was examined for inconsistent or unusual procediires and 
results using the following criteria: a) how the data were generated 
(e.g. , random number generator used) , b) evidence of the success of the 
data genr;:ration (e.g. ^ skewness and kurtosis statistics computed for the 
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simulated data) ^ and c) the pattern of empirical type I error results 
when underlying assumptions of the F-test were satisfied (e.g.^ whether 
the empirical type I error rate of the F-test converged toward the 
nominal value as sample size increased if all assumptions are satisfied) . 
No irregularities were noted and thus all twenty-one studies were judged 
to be methodologically sound. 



Coding of Outcome and Explanatory Variables 

The outcome variable for the meta-analysis was *:ype I error rate. 
This variable was coded directly. Only results associated with a nominal 
level of .05 were coded. Several characteristics of the MC studies were 
coded as exp2anatory (i.e.^ predictor) variables. They are listed below: 



(1) type of population score distribution 

nonaal (71 = 0^ 72 = 0) 

tinifoi-m (7i = 0, 72 = -1.12) 

double-exponential (71 = 0^ 72 = 3) 

log-normal {y^, 72 depend on the parameters used) 

Cauchy (7^ = 0, 72 undefined) 

exponential (71 = 2^ 72 = 6) 

logistic (7i = 0, 72 4.2) 

t (71 0^ 72 = i//(i/-4)^ i/ = error degrees of freedom) 
mixed normal (71^ 72 defined for each application) 

other [the other category includes the binomial distribution (y^, 72 
depend on the proportion of successes and sample size) and 
Poisson distribution (71^ 72 depend on parameter specified) 

(2) number of groups 

(3) total sample size 
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(4) ratio of largest to smallest sample size 



1 1 (sample sizes equal) 

2 = > 1 and <1.25 

3 = > 1.25 and < 1.5 

4 = > 1.5 and < 1.75 

5 = > 1.75 and < 2.0 

6 = > 2.0 and < 3 

7 = > 3 and < 5 
t = > 5 



(5) ratio of largest to smallest variance 

1=1 (all variances equal) 

2 = > 1 and < 2 

3 = > 2 and < 3 

4 = > 3 and < 5 

5 = > 5 and < 8 

6 = > 8 



(6) pairing of sample size and variance 

1 = positively correlated (e.g., large variances paired with large 

sample sizes) 

2 = negatively correlated (e.g., large variances paired with smaller 

samples) 

3 = other 



(^) number of samples (replications) 

The population score distribution information was captured by coding 
skewness (7^) and kurtosis (72) values (Kendall & Stuart, 1977, Vol. I, 
pp. 187-189) . The 7^ and 72 indices for the unimodal but skewed and 
kurtic lognormal distribution depend on the selected parameters and hence 
two MC studies employing a lognormal distribution may be examining quite 
different distributions. Similarly, the 7^ and 72 indices for the 
binomial and Poisson distributions depend on the parameters specified in 
the MC study and if this information was not reported, which was often 
the case, these indices could not be coded. The kurtosis associated with 
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a Cauchy distribution could not be coded since the variance theoretically 
does not exist. 

The selection of ranges for coding the pattern of sample sizes and 
the pattern of variances was guided by conditions reported in the sample 
of MC studies* The number of replications variable is the number of 
randomly generated samples upon which the empir.^oal type I error values 
are ba5ied. This variable was coded since it is related to the magnitude 
of sampling error of the empirical proportions of rejections. 

Data Evaluation 

Accuracy of Coding and Data Entry 

A three phase process was used to ensure that the characteristics of 
each MC study were accurately coded and correctly entered into a computer 
d'ita file in preparation for statistical analysis. In an initial 
training phase, two of the twenty-one MC studies were reviewed and coded 
by all four authors. The structure of one of these studies was 
relatively simple and t ^ other was more complex. Coding forms based on 
the above coding scheme were completed for each study by each author. 
The completed coding forms were then compared. Instances of uncertainty 
or disagreement over particular characteristics of a MC study (e.g., how 
sample sizes and variances were paired) were resolved by group consensus. 
Information from this training phase was used to modify the coding form«. 

In the next phase, eight of the twenty-one MC studies were equally 
divided among two teams of coders, each made up of two of the authors. 
The members of a team independently reviewed and coded each article 

9 



ERIC 



10 



assigned to them using the modified coding forms. Members of a team then 
compared their results and attempted to resolve discrepancies among 
themselves. Only a few instances of inconsistent coding were 
encountered. Each of the remaining MC studies was coded by one of the 
authors. 

In the third phase, the coded MC data were entered into a computer 
data file and then checked for accuracy. The twenty-one MC studies 
generated approximately 553 lines of data (i.e., 553 EMs) . The size and 
complexity of the data set virtually guaranteed errors in data entry.. 
Two strategies were used to detect and correct data entry errors. First, 
a computer printout of the entire data file was scanned in order to 
detect obvious errors, e.g., type I error values falling outside an 
expected range. Second, a comprehensive check was carried out by 
randomly assigning the twenty-one articles to the four authors, having 
each author read tlie articles assigned to them, and check the coded data. 
Errors in coding and data entry detected in this fashion were then 
corrected. 

Data Analysis and Interpretation 

The goal of quantitatively summarizing MC results for a particular 
statistical test is to construct a statistical model that explains the 
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Table 1 +@ 

SuiTiinary Statistics for Quantitative Variables 







for the 


Sample of 


Monte Carlo 


Studies 




Variabla 


Cases 


Mean 


Median 


Stdev 


Minimum 


Maximum 


TYPEI 


553 


.059 


.050 


.039 


.004 


.309 


SKEW 


1056 


1.08 


0 


1.78 


0 


6.19 


KURT 


1056 


11.14 


0 


28.4 


-3.75 


110.9 


TOTALN 


1225 


45.4 


32 


43.5 


8 


750 


REPSl 


1070 


4289.6 


2000 


4110.5 


A 00 


10,000 


+ Cases = 


= number 


of MC cases, Stdev 


= standard 


deviation. 





@ Table 1 ^results include power values which were not consiuered in the inferential 
analyses later 

behavior of th^t statistical test as a function of study characteristics 
(Harwell/ 1990) . Recall that available analytic and eittpiri::al evidence 
of the behavior of the F-test will be directly compared against the neta- 
ajialytic results. This will provide evidence about the usefulness of the 
proposed methods. The relationship between heterogeneous variances and 
type I error when sample si.-^es are equal will also be investigated. 

Descriptive Analyses 

The first stage of the data analysis was descriptive in nature. 
Statistics were computed for a variety of quantitative an^ qualitative 
variables. Summary information on the sample of twenty-one MC studies is 
given in Tables 1 and 2. The variables in these tables represent 
empirical type I error values (TYPEI), skewness (SKEW), kurtosis (KURT),, 
total sample size (TOTALN) , number of replications for the t^^e I error 
case (REPSl) , number of replications for the power case (REPS2) , number 
of groups (NUHGRPS) , ratio of largest to smallest samplt- sizes (SAMPLE) , 
ratio of largest to smallest variances (VARIANCE) , and pairing of sample 
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Table 2 + @ 

Summary statistics for Qualtitative Variables 
for the Sample of Monte Carlo Studies 



Type of P">! ulation Score Distribiition 



Number of Groups 



Cataqory Frequency 

Normal 450 

Uniform 30 
Dbl. Exponential 20 

Log-normal 133 , 

Cauchy 20 

Exponential 182 

Logistic 16 

t 15 

Mixed-normal 67 
Other 

Total 1225 



i 

36.7 
2.4 
1.6 

10.9 
1,6 

14.9 
1.3 
1.2 
5.5 

23.8 

100 



Cataqory 
2 
3 
4 
6 
8 

Total 



Frecpaency 
442 
194 
520 
9 
60 
1225 



i 
36.1 
15.8 
42.4 
.7 

4.9 
100 



Ratio of Largest/Smallest Sample Size Pairing of Sample Size/Variance 



Cataqory 
Equal 

1- 1.25 
1.25-1.5 
1.5-1.75 
1.75-2 

2- 3 

3- 5 

>5 

Total 



Frequency 
763 
0 
49 
0 
11 
243 
138 

1 

1225 



i 
62.3 
0.0 
4.0 
0.0 
.9 
19.8 
12.9 

a 

100 



Cataqory 
Other 
Pos. Corr. 
Neq. Corr. 
Total 



Frequency % 

1005 82 

124 10.1 

96 7^8 

1225 100 



Ratio of Largest/Smallest Variance 



Cataqory 


Frequency 


% 


Equal 


878 


71.7 


1-2 


119 


9.7 


2-3 


42 


3.4 


3-5 


87 


7.1 


5-8 


39 


3.2 


>8 


60 


4.9 


Total 


1225 


100 



+ Dbl. exponential = double exponential, Pos. Corr. = positively 
correlated, Neg. corr. = negatively correlated. size and variance 
(PAIRING) . 

& Table 2 res:;lts incude power cases which were not considered in the inferential analyses 
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The rii ^lts in Table 1 indicate that the average type I error rate 
across the sample of N = 553 type I error values was quite close to the 
nominal value. A plot of the empirical TYPEI values appears in Figure 
1. This distribution is noticeably skewed. Another interesting 
statistic in Table 1 is the difference betwean the minimum and maximiam 
number of replications. This difference suggests results of vaicying 
precision. TeQ^le 2 contains summary statistics for qualitative 
varieibles. 

Quantitative Analyses 

To construct and evaluate explanatoicy models, a fixed-effects 
regression model was fitted to the empirical type I error values (see 
Hedges & Olkin, 1985 , p. 169). The fixed-effects regression models were 
of the form 

Pk = Xi ^1 + X2 )92 + ••• + Xjcr )9t , k=l,2,...,K (1) 

where Pj^ is the k(th) EH which depends on a set of T fixed explanatory 
variables Xy^j, and fi^ is a regression coefficient that captures the 
relationship between the t(th) predictor variable and the k(th) EM (see 
Harwell^ 1990). In A:he present context , the empirical type I errors 
served as the pj^ and the coded characteristics of the MC studies as the 

Specified explanatory models were fitted to the EMs and a test of 
the relationship between the set of T predictor variables and the pj^ was 
performed using the weighted sum of squares due to regression statistic 
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Or given in Hedges and Olkin (1985^ p. 171). Under the hypothesis of no 
relationship between the set of explanatory variables and the outcome 
variable y Qr is approximately distributed as a chi-square w^th T degrees 
of freedom. The squared multiple correlation coefficient was used as an 
index of the explanatory power of a models A test of model 
misspecif ication (i.e.^ whether all of the explanatory variables needed 
to explain variation in the pj. are in the model) was performed using the 
Qs statistic^ also given in Hedges and Olkin (1985/ p. 173). Under the 
hypothesis of no model misspecif ication ^ Qe is approximately distributed 
as a chi-square with K-T-1 degrees of freedom. All tests, used an error 
rate of .05. Listwise deletion of missing data reduced the ntunber of 
cases used in the analyses. 

Six explanatory models were investigated for the type I error case 
using the SPSSX (1983) computer program. Each model is discussed below. 
A summary of the results of the regression analyses appears in Table 3. 
Examination of the residuals of each of the models indicated no 
noticeable departures from normality. Note that all of the Qr and Qg 
statistics in Table 3 are significant at p < .001 and are often quite 
large. Despite the misspecif ication of all of the models^ the multiple 
statistic appeared to be a useful index of the explanatory power of a 
model . 

Model 1 

Model 1 investigated the relationship between type I errors and the 
predictor variables SKEW, KURT, VARIANCE, TOTALN, NUMGRPS, SAMPLE, and 
REPSl: 
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Table 3 + 



KODEL 


T 


Cases 






r2 


7?^ 

adj 


la 


7 


416 


2833.6 


26181.6 


.10 


.08 


lb 


59 


416 


6786.9 


22228 . 3 


.23 


.11 


2a 


5 


416 


2075.3 


26939.9 


.07 


.06 


2b 


7 


416 


2833.6 


26181.6 


.10 


.08 


3a 


6 


416 


2714.9 


26300.3 


.09 


.08 


3b 


7 


416 


2833.6 


26181.6 


clO 


.08 


4a 


7 


416 


2833.6 


26181.6 


.10 


.08 


4b 


8 


416 


2872.9 


26142.3 


. 10 


.08 


5a 


7 


149 


2583.03 


22429.5 


.10 


.06 


5b 


8 


149 


17909.3 


7103.2 


.72 


.70 


5c 


7 


76 


4076.0 


3179.2 


.56 


.52 


5d 


7 


73 


777.6 


237.6 


.77 


.74 


6a 


5 


195 


1479.2 


2440.2 


.38 


.36 


6b 


6 


195 


2658.9 


1260.5 


.68 


.67 



+ T = nviinber of predictors, Cases = niamber of cases, Qr is the weighted 
s\m of squares due to regression statistic, Qg is a statistic testing 
model misspecification, * means significant at p < .001, is the 
squared correlation between the set of predictors and the outcome 
variabes, and R^^^j is R^ adjusted for the number of predictors (see 
Marascuilo & Serlin, 1988, p. 661). 



model la 

TYPEI = SKEW + KURT ^2 + VARIANCE fi^ + TOTALN fi^ + NUMGRPS fis + SAMPLE)?, 

+ REPSl ^7 

model lb 

TYPEI = SKEW Pi + KURT fi^ + VARIANCE fi^ + TOTALN 0^ + NUMGRPS ySs + SAMPLEyg^ 
+REPS1 ^7 + 52 predictors representing two-variable-at-a-time and 
three-variable-at-a-time interaction effects 



In model la seven predictor variables were fitted to the TYPEI 
values. The results in Table 3 indicate that there is a statistically 
significant relationship between the set of seven predictor variables 
and TYPEI. However, the R^^^j value of .08 suggests that the model 
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possesses little explanatory power. This modest relationship supports 
the cominonly held notion that the type I error rate of the F-test in the 
oneway fixed-effects ANOVA modr . is robust. In the present context^ 
robustness would be indicated by a weak relationship between aset of 
predictor variables and the outcome variable TYPEI. Model lb was used to 
investigate the relationship between interactions of predictor variables 
and TYPEI with rhe effects of the seven original predictor variables held 
constant. Fifty-two predictors representing almost all possible two- 
variaJDle-at-a-time and three-variable-at-a-time interactions among the T 
= 7 predictors in la were entered after the seven original predictors. 
Collinearity problems prohibited four of the interactions from being 
entered into the model. Although the increase in the Qr statistic 
between models la and lb (Qr oi, -Qr = 3953.3) is statistically 
significant^ the relatively small difference in the adjusted R^s (.12) 
suggests that the addition of the interaction effects only slightly 
increased the explanatory power of the model. On the whole , the results 
of model lb suggest that the type I error rate of the F-test is 
relatively insensitive to multiple assumption violations. 

Model 2 

Model 2 was used to investigate the effect of the shape of the 
population score distribution, as captured with skewness and kurtosis 
indices, on type I errors. The models investigated were: 

model 2a 

TYPEI = VARIANCE + TOTALN p^, + NUMGRPS + SAMPLE ^4 + REPSl 
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model 2b 

TYPEI = VARIANCE + TOTALN pn + NUMGRPS + SAMPLE Pt, +REPS1 +SKEW P^ 
+ KURT py 

Comparing the results for medals 2a and 2b indicates that the 

QR2b "* QR2a difference was significant; however, the difference in the R^s 

suggests that type of population sccre distribution had little to do with 

explaining variation in the type I errors. This result supports the 

perception that the type I error rate of the F-test is robust to 

departures from the assumption of normality of a population score 

d ist r ibut ion . 

Model ? 

The effect of the nujDber of replications variable on type I errors 
was investigated in model 3. The models were: 

model 3a 

TYPEI = VARIANCE /9i + TOTALN ^2 + NUMGRPS p^ + SAMPLE Pi, -fSKEW p^ +KURT p^ 
model 3b 

TYPEI = VARIANCE p^ + TOTALN p2 + NUMGRPS p^^ + SAMPLE Pi, + SKEW Ps + KURT/Sg 
+ REPSl p-j 

The results in Table 3 indicate that, with the othei.^ predictors held 
constant, number of replications had a negligable impact on type I errors 

(^\dj 3b "* R^adJ 3a = 0) • 

Model 4 

In model 4 the possibility of a quadratic relationship between type 
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I error and total sample size was investigated. The rationale was that, 
other factors held constant, as sample size increases the type I error 
rate should converge toward its nominal value, but there may be a point 
beyond which larger samples contribute little to this convergence. The 
models were: 

model 4a 

TYPEI = VARIANCE + TOTALN p,. + NUM6RPS + SAMPLE + SKEW + KURT/Jg 
+ REPSl Pj 

mode.l 4b 

TYPEI = VARIANCE p^ + TOTALN p,. + NUMGRPS /jg + SAMPLE p,, + SKEW p^ + KURT/Jc 
+ REPSl pj + TOTALN^ 

The results in Table 3 suggests that there is no quadratic relationship 
between sample size and type I errors. 

Model 5 

The relationship between pairing unequal sample sizes and variances 
and type I errors was investigated in models 5a-5d. Theoretical and 
enqpirical work suggests that the meta-analysis should detect a strong 
relationship between the set of predictor variables (including PAIRIl.G) 
and type I error. Models 5a and 5b were: 

model 5a 

TYPEI = SKEW /9i + KURT VARIANCE ^3 + TOTALN p^ + NUMGRPS + SAMPLE 

+ FJIPSl Pj 

model 5b 

TYPEI = SKEW ^1 + KURT ^2+ VARIANCE p^ + TOTALN p^ + NUMGRPS ^5 + SAMPI£ )3c 

+ REPSl -i- PAIi^iNG 
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The results of these analyses, reported in Table 3, provide strong 
evidence of the relationship between type I error and pairing. Model 5a 
produces an R^^dj - -06 which is similar to that of model la. Note, 
however, that model 5 analyses are restricted to MC results examining tl^s 
effects of pairing and thus are ^ased on a smaller sample. Model 5b 
includes the pairing variable and produces an R^^aj = -^O. The 
R^adj 5b R^dj 5a = -64 and Or 5b Or 5a 1S326.3 differences suggests a 
strong r<ilationship between type I errors and the pairing of unequal 
sample sizes and variances, with the effects of the other predictor 
variables held constant. These results are consistent with theoretical 
and previous empirical evidence (Glass et al., 1972). 

Specific evidence about the role of sample size and variance 
pairincjs were examined through models 5c and 5d. Model 5c investigated 
the relationship between type I error and the predictors skewness, 
kurtosis, number of groups, total sample size, and nximber of replications 
but was restricted to MC data in which sample sizes and variances were 
positively correlated, e.g., smaller samples paired with smaller 
variances; model 5d investigated this relationship when samples and 
variances were negatively correlated, e.g., larger samples paired with 
larger variances. The models were: 

model 5c (sample sizes/variances positively correlated) 
TYPEI = SKEW fii + KURT fi^ + TCTALN fi^ NUMGRPS fi^ + REPSl 0^ 

model 56, (sample sizes/variances negatively correlated) 
TYPEI = SKEW fii + KURT fi^ + TOTALN fi^ + NUMGRPS fi^ + REPSl fis 
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The rosultc for mc^els 5c and 5d in Table 3 suggest a strong 
relationship between type I error and the explanatory models for positive 
and negative pairing of sample siaes and variances. 

Model 6 

Model 6 investigated the rela-^ionship between t^^pe I error and 
heterogeneous variances when sample sizes are equal. All of the data 
used in these analyses are based on equal sample sizes. The models were: 

model 6a (equal sample sizes) 

TYPEI « SKEW /9i + KURT /Sa"* + TOTALN + NUMGRPS fit, + REPSl 

model 6b (equal sample sizes) 

TYPEI = SKEW )9i + KURT P2, + TOTALN + NUMGRPS + REPSl Ps + VARIANCE Ps 

The adjusted R^ for model 6b (.67) and the difference R^^j ^ • R^^^ =.35 
suggests a strong relationship betvraen variance inequality and type I 
error even though sample sizes are equal. Further evidence of this 
effect is provided by examining the type I error means for the variance 
condition variable when sample si7:es are equal. This information is 
presented in Table 4. Variance ratios greater than 2 produce a 
noticeably inflated type I error rate, a pattern that is exacerbated as 
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Table 4 

Average Type I Error Rates By Variance Ratios 
For Equal Sample Sizes 

VARIANCE 





Equal 


1-2 


2-3 


3-5 


5-3 


> 8 


Mean 


.046 


.051 


.060 


.064 


.0c4 


.079 


N of cases 


144 


7 


15 


15 


6 
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the variance ratio increases. This pattern persists even if the variance 
condition is restricted to ratios < 5. In this case, analysis 
of a model identical to 5b (not given) produced an R^^dj = •54. These 
results sugests that equal samples provide little protection against 
inflated type I error rates when variances are heterogeneous. 

Conclusions 

The results of the meta-analysis suggest the following conclusions: 

1. The type I error rate of the F-test is insensitive to type of 
population score distribution and relatively insensitive to 
combinations of violations of assumptions. 

2. There was no relationship between the number of replications and 
type I errors, despite the large differences in these values across 
studies . 

3. There is no quadratic relationship between sample size and type I 
errors. 
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4. There is a strong relationship between inverse pairing of sample 
sizes and variances and type I error. This extends ^co the case in 
which scunple sizes and variances are positively paired. 

5. There is a moderate relationship between the set of predictor 
variables and type I error when sample sizes are equal; for unequal 
sample sizes there is only a weak relationship. 

6. Equal sample sizes provide little protection against inflated type I 
error rates when variances are heterogeneous. This pattern is 
present for variance ratios < 5. 

Summary 

The application of quantitative methods of research synthesis to 
summarize Monte Carlo results shows great promise for inforuiing 
methodological practice. Construction of an empirical framework of Monte 
Carlo studies of a statistical test should result in guidelines for the 
appropriate use of particular statistical tests under specific assumption 
violations. This will also permit previous statistical analyses to be 
evaluated considering these guidelines. 

The present results suggest that meta-analytic methods can usefully 
be applied to summarizing Monte carlo results of particular statistical 
tests. The results support the commonly help perception of the 
robustness of the type I error rate of the oneway fixed-effects ANOVA F- 
test for a variety of conditions. Nonnormal population score 
distributions, different sample sizes, numbers of groups, and unequal 
sample sizes had little effect on type I errors. However, the results of 
the meta-analysis provide new evidence that researchers should not rely 
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on equal sample sizes to neutralize the effects of heterogeneous 
variances. Under these conditions, the likely result is an inflated type 
I error rate. 

The next step in the process of deriving guidelines for using the F- 
test when assumptions are violated is to tease out more specific 
information from the explanatory models identified as being correlated 
with type I errors. The goal would be to identify conditions (e.g., 
variance inequality and sample size) associated with specific type I 
error values. This requires a more sophisticated methodology (e.g., 
response surface methodology) . The same process should also be used to 
investigate the relationship between various explanatory models and the 
power of the F-test. 
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